巴西专利BR112012005477B1 VIRTUAL INSERTS IN 3D VIDEO

专利PDF首页>>巴西专利

专利附录

专利说明

权利要求

类似技术

同族专利

引用文献

法律状态

优先权

专利摘要:
virtual inserts in 3d video. modalities refer to 3d video insertions. virtual camera models allow inserts to be reconciled against the left and right channels of the 3d video to maximize the 3d accuracy and realism of the inserts. the chambers are formed as composite, and can be derived from other models. camera models can be based on a visual analysis of the 3d video and can be based on 3d camera data including toe-in and eye spacing. camera data can be derived from information collected using instrumentation connected to a 3d camera system, derived based on visual analysis of 3d video, or derived using a combination of information collected using instrumentation and visual analysis of 3d video. inserts can be adjusted in 3d space and/or transmitted separately to a remote site. inserts can be adjusted in 3d space based on an insert type, 3d video scene composition, and/or user feedback, including interactive adjustment of 3d inserts and adjustments in view of user sensitivity to fatigue visual.
公开号:BR112012005477B1
申请号:R112012005477-5
申请日:2010-09-10
公开日:2021-06-29
发明作者:Sheldon Katz；Gregory House；Howard Kennedy
申请人:Disney Enterprises, Inc；
IPC主号:

专利说明:

FIELD OF THE INVENTION
Modalities refer to virtual insertions in 3D video. BACKGROUND OF THE INVENTION
Methods for transmitting video content to viewers can use stereoscopy to project program content onto a 3D field. Capable 3D systems can transmit separate channels for left eye and right eye images, providing parallax views of scenes. Although methods for adding virtual inserts to conventional 2D video are known, such 2D methods may not be suitable for providing optimal viewing experiences for 3D video viewers. Consequently, there is a need to provide more realistic virtual inserts in 3D video that look, to viewers, as if they were part of the original production. BRIEF DESCRIPTION OF THE DRAWINGS
The accompanying drawings are included to provide further understanding, are incorporated in and constitute a part of this specification, and illustrate embodiments which, together with the description, serve to explain the principles of the invention. In the drawings: FIG. 1 is a schematic illustration of a modality for generating inserts and enhancements in 2D video. FIG. 2 is a schematic illustration of a modality for generating inserts and enhancements in 3D video. FIG. 3A illustrates a first view of an exemplified 3D occlusion method according to an embodiment. FIG. 3B illustrates a second view of a 3D occlusion method exemplified in accordance with FIG. 3A. FIG. 4A illustrates a first view of an exemplified 3D occlusion method in accordance with an embodiment. FIG. 4B illustrates a second view of the 3D occlusion method exemplified in accordance with FIG. 4A. FIG. 5 is a schematic illustration of a modality for generating inserts and enhancements in 3D video. FIG. 6 illustrates an exemplified 3D video production and distribution channel according to an embodiment. FIG. 7 is a block diagram of an exemplified computer system in which modalities can be implemented.
The present embodiments will now be described with reference to the accompanying drawings. In the drawings, like reference numerals may indicate functionally similar or identical elements. DETAILED DESCRIPTION
While the present invention is described herein with respect to illustrative embodiments for particular applications, it should be understood that the invention is not limited thereto. Those skilled in the art having access to the teachings provided herein will recognize additional modifications, applications and embodiments within the scope of the invention and additional fields in which the invention would be of significant utility.
Modalities include inserting enhancements, such as advertising logos, score boxes and the first line of descent in football games, into 3D video content. Modalities refer to 3D media, including, but not limited to: video, television (open broadcast, cable, satellite or fiber optic), cinemas, Internet, mobile devices (cellular phone or other wireless device), and other 3D video media streaming platforms.
Inserts and enhancements in the 2D video can be integrated with the video so that they can more realistically look like being part of the original video. Inserts can be implemented, for example, as described in the US Patent. No. 5,264,933 of Rosser et al, filed January 28, 1992 and entitled “Television Displays Having Selected Inserted Indicia”, the contents of which are incorporated in their entirety herein by reference. A virtual insert system for 2D video can use any number of search techniques to recognize a scene and build a virtual camera model of the scene to add the virtual inserts. A camera model can include a camera position and other parameters, which make it possible for a camera to be located in relation to a scene. Once the scene is recognized, subsequent frame models can be calculated by any number of methods to accompany the video scenes. Occlusion processing makes it possible for foreground objects within the video to block inserts added to the scene background. Occlusion calculations can be performed based on the color of the scene, for example in “chroma key” systems. The insert needs to be rendered using a commercially available graphics renderer, for example, before being blended with the program video. The modalities cited here can be used in combination with motion systems, where motion information is extracted from a scene and used to match an insertion motion to the scene motion.
The camera model can contain multiple parameters that refer to the physical measurements of a camera mounted on a tripod, such as pan, tilt, roll, image distance, x position, y position, and z position. Other parameters, such as parameters for radial lens distortion, for example, can be used. Camera data parameters can be derived from data collected using instrumentation connected to a 3D camera system, can be derived based on visual analysis of the 3D video, or can be derived using a combination of data collected using instrumentation connected to the system. of 3D camera and the visual analysis of 3D video. The camera model itself may contain part of all the information needed to describe the field of view of one or both channels of 3D video. For example, it can contain a single parameter such as zoom or the distance of the image associated with either the right channel or the left channel. Alternative single channel parameters include, but are not limited to focus, rotation, lens distortion, etc. Camera data parameters determined for a channel view can be independently derived from another channel view. Also, camera models can be limited to one or more parameters associated with both 3D video channels (camera position, pan rotation, tilt). Camera data parameters can be determined using similarities between individual channel views. Furthermore, camera models can be limited to parameters that describe the relationship of the left and right channels (eye spacing, angle of convergence, etc.). Camera data parameters can be determined using the differences between individual channel views. It should be understood that camera data parameters can be represented with a wide range of measurement units or devices. A composite camera model for 3D video can be comprised of camera models for the individual channels of 3D video.
In one embodiment of this invention, a camera model can be represented as one or two 3x3 matrices. In another modality, the camera model can be generated using other dimensional matrices. The elements or parameters of the matrix representation can be considered camera data parameters. Array parameters can include external camera parameters such as camera position coordinates and internal parameters such as horizontal and vertical sensor scale factors. Other methods, such as, for example, methods based on homography, can be used and the present invention does not have a particular device to calculate a camera model. For example, the camera model may simply provide a homographic relationship between the current camera channel views and some physical reference, such as the plane containing the basketball court. In another example, the camera model might include a homographic mapping between the input view and a reference image of the scene, where the reference image is used to define the location of a graphical inset. The elements or parameters of the homographic mapping can be considered the camera data parameters. In additional embodiments of this invention, the camera models can be a representation of the location of an object, a group of objects, or a part of the scene on 3D video channels. As with all camera models, the object's location in the field of view can be updated over time.
Modalities based on adding virtual inserts to 2D video can be incorporated to generate separate inserts for the left eye and right eye video channels, as needed to be used in 3D video systems. Such modalities can address the insertion errors that can occur in each left and right channel camera model. These model errors can occur due to noisy pixel data when searching or tracking video channels, for example. Trackers using template blocks may have a random component, that is, blocks may be randomly selected, and may not provide consistent behavior across channels in another example. When adding virtual inserts to 2D video, for example, a search model error can cause a virtual advertisement located on a basketball court to be misplaced for about 0.5 meters on the court. 2D video viewers may not even find this unpleasant, especially if the logo position is relatively far from nearby main features, such as the intersecting court lines.
In 3D video, however, similar mallocation errors for the left eye and right eye images can be considered unpleasant because the left eye and right eye images are poorly located to each other, especially if the errors do not tend to accompany each other. each other. Consequently, relating the left eye and right eye camera models to a reference or maintaining a relative difference between the right and left eye models, as described here, can enhance the experiences of the viewer watching the virtual inserts in 3D video. FIG. 1 is a schematic illustration of a modality for generating inserts and enhancements in 2D video so that they can realistically appear to viewers as being part of the original video. A video source, such as the video input/program video feed, is the input to the subsystem as the video input 101. The video input 101 can be modified to include inserts and enhancements, and output as the output 121. A main controller 103 represents a hardware and/or software module that can control and coordinate subsystem blocks 103-113. Search block 105 represents a hardware and/or software module that can analyze video input 101 to calculate camera models and compute scene geometry for program video scenes. Tracking block 107 represents a hardware and/or software module that can track objects within program video to reduce processing requirements for search block 105 and enable smoother tracking of inserts and enhancements associated with a plan of background of video input 101. Occlusion block 109 represents a hardware and/or software module that can determine when foreground objects should block inserts and enhancements and generate an occlusion switch to enable mixer 113 to display inserts and locked enhancements. Render block 111 represents a hardware and/or software module that can receive the camera models, insert locations, occlusion keys and other information to render inserts and enhancements to mix with the video output 121. Any type real or virtual graphics that are combined or otherwise blended with the 3D video can be considered to insert an enhancement into the 3D video. This exemplified modality is for illustration only and the modalities can be implemented with various other architectures comprising hardware, software or combinations of hardware and software for one or more blocks. Multiple render blocks 111 and mixers 113 can, for example, serve as back-end processors to provide multiple versions of inserts and enhancements to different viewers. In some embodiments, the Seek block 105 and the Track block 107 can be combined, for example. In other embodiments, the Search block 105, the Track block 107 and the Occlusion block 109 can be combined.
Basic approaches to generating the camera models can include physical sensors connected to the camera system, image processing and computer vision analysis of the video channels, or a combination of physical sensor measurements and video analytics processing. Vision processing can be employed by Search block 105, Track block 107 or Occlusion block 109 in FIG. 1. For Search block 105, visual analysis can be used to derive image or screen locations from visual features in the scene. Camera models can be generated by particular frames by associating the image feature locations and their corresponding 3D scene positions. Such methods are described in the US Patent Application. No. 12/659,628, the contents of which are fully incorporated herein by reference. For Tracking block 107, visual analysis can be used to track the location of features or points of interest between frames of an image sequence. An exemplified operation is described in the US Patent Application. No. 6,741,725 by Astle, the contents of which are incorporated in their entirety herein by reference. For Occlusion block 109, visual analysis can be used to distinguish the foreground pixels of an image from the background pixels. A color-based method is described by Jeffers et al in US Patent. No. 7,015,978, the contents of which are incorporated in their entirety herein by reference. FIG. 2 illustrates a schematic representation of an exemplified modality for generating inserts and enhancements in 3D video. The 290 controller can employ the methods used to add virtual inserts to 2D video. Controller 290 represents a hardware and/or software module that can interface with video processing units for both the left and right channels. The program video for the left eye video channel is input, as the left video input 201, to the corresponding search 205, tracking 207, occlusion 209, and mixer 213 subsystems. The program video for the video channel from the right eye is inserted, as the right video input at 251, to the corresponding search subsystems 255, tracking 257, occlusion 259 and mixer 263. The left/right video input 201, 251 can be modified to include inserts and enhancements, and output as left video output 221 and right video output 271, respectively.
Controller 290 can control and coordinate the various subsystem blocks. The Search blocks 205, 255 represent the hardware and/or software modules that can analyze the left/right video inputs 201, 251, and calculate the camera models for the program video scenes. Tracking blocks 207, 257 represent the hardware and/or software modules that can track objects within the video to reduce processing requirements for search blocks 205, 255, and enable smoother tracking of inserts and enhancements with respect to the backgrounds of the right and left video inputs 201, 251. Occlusion blocks 209, 259 represent the hardware and/or software modules that can determine when foreground objects should block inserts and enhancements in order to generate an occlusion switch that enables mixers 213, 263 to display locked inserts and enhancements. Render blocks 211, 261 represent the hardware and/or software modules that can receive the camera and other models, the insert locations, occlusion keys and other information to render the inserts and enhancements for mixing left video output /right 221, 271.
Controller 290 may include a model manager 292 which monitors seek blocks 205, 255 and follower blocks 207, 257 to determine current camera model information for the left and right video channels. The 292 model manager can match each camera model to the left and right video channels to reconcile the left and right camera models. For example, Model Manager 292 can calculate an average/reference camera model that has a camera position, in 3D coordinates, midway between the left channel and right channel camera models. In some cases, it may be preferable to use the left-channel or right-channel camera models as a common reference. Using a common or average reference camera model associated with both the left and right video channels can reduce the effects of camera model mismatch between the left and right channels. Left and right channel camera models, for example, can be offset by fixed amounts or distances to the common reference camera model. As an example, the left and right channel camera models can be made to have fixed spatial distances in 3D coordinates from the 3D coordinates of the common reference camera model. The distance between the left and right camera models can, for example, correspond to the distance between the left and right camera lenses for a known 3D camera system. The distance between the camera lenses, or the eye distance or spacing, may vary during video sequences, but an average distance may be adequate for some applications. For other applications, a more accurate model eye spacing with known formulas or approximations, for example, may be desirable. In another example, the displacements of the common reference camera model can be calculated using methods to calculate the parallax between stereoscopic images. This can be done by visually analyzing the left and right channels, or by visually analyzing the left and right channels, or visually analyzing the left and right channels together. Stereoscopic parallax analysis can be used to determine or derive the relationship between 3D video channels. Camera data parameters can be derived based on parallax analysis or stereoscopic analysis of 3D video channels. Cross-channel reconciliation can be used for a subset of parameters as well. For example, zoom or magnification data can be reconciled based on an average zoom value before the left and right camera models are reconciled. In this example, zoom data could be noise filtered before applying it to calculate camera models. Alternatively, a least squares fit can be employed to find a better match with the input parameters.
Limiting physical parameters to validate expected ranges is another mechanism that can be employed in the reconciliation process. This can apply to individual points in time as well as over a period of time. For example, the rate of change of a particular parameter, such as zoom, can be limited or smoothed. This can be achieved in part through image processing of 3D video channels or through signal processing of physical sensor measurements. Reconciliation may use known filtering techniques, statistical methods, thresholding methods, or other approaches. Reconciliation can be applied to individual camera data parameters or to a grouping of camera data parameters. A grouping of camera data parameters, such as a composite camera model, can be consistent or otherwise reconciled with one or more camera data parameters. In some embodiments, a composite camera model and one or more camera data parameters are consistent, or otherwise reconciled with initial estimates for one or more individual camera data parameters. Reconciliation may involve producing a camera model or camera data parameters consistent with other camera data parameters. In one embodiment, one or more camera data parameters or camera models that are reconciled with a first camera data parameter can be generated simultaneously with the first camera data parameter. In another embodiment, one or more camera data parameters or camera models that are reconciled with a first camera data parameter are generated sequentially after the first camera data parameter. In alternative modalities, one or more camera data parameters or camera models that are reconciled with a first and second camera data parameter are generated either simultaneously or sequentially after the generation of the first and second camera data parameter . Reconciliation can be based on 3D video channels, visual analysis of 3D channels, camera parameters derived from 3D video channels, sensor measurements or camera parameters of the 3D camera system, or any combination of the above. Reconciliation is not limited to a particular method or group of methods.
For some 3D applications, it may be desirable to use more frequent search models compared to 2D applications, to minimize the shifts that can occur with scene tracking. It may also be desirable to minimize the relative displacements between the left and right channels to each other. Search accuracy for 3D insertion applications is desirable in view of the potential errors associated with 3D objects converging to incorrect locations within 3D scenes, for example. Such errors can make inaccurate inserts look remarkably unnatural in 3D, in contrast to localization errors in 2D images. For example, a convergence error for a first 3D descent line in a television football game can cause the field line to appear above or below the playing field. The left and right channel inserts of the first descent line must match in length as well as have correct positions, or the ends of the line may not look natural in 3D. Additional types of errors, for example unforeseen errors, can cause objects to move suddenly in 3D space. Size mismatches can cause color errors or other appearance problems. In such cases, the 292 model manager can improve performance by taking into account camera model differences between the left and right channels.
Tracking blocks 207, 257 can use 2D methods to track scenes, such as texture model methods. (See, for example, US Patent No. 6,741,725 by Astle entitled “Motion Tracking Using Image-Texture Templates”). Using visual analysis of 3D video channels, texture models or tracking blocks can be selected within scenes for scene generation and tracking models. Tracking methods can use 2D information in scenes using the left and right channel 2D texture models. Other tracking methods can use 2D scene texture, but use 3D position information for tracking blocks. Such methods can be called 3D tracking methods, although they use 2D texture models. In other cases, 3D information derived from stereoscopic views of the left and right channels can be used. Voxel-based 3D tracking blocks, or 3D pixels, can be used for tracking 3D scenes. Such methods can be extended to other techniques such as optical flux. For many applications, 2D processing can be adequate, however, and minimizes complexity and cost. In some modalities, an object or groups of objects or a part of the scene can be tracked on individual 3D video channels, or tracked together through both channels simultaneously.
Using voxels, some errors of the 2D tracking method can be avoided. For example, 2D model tracking may fail when too many model blocks, relative to background blocks, are on moving foreground objects in scenes. When such foreground objects move relative to a background, incorrect camera models can be calculated. This can happen, for example, during a basketball game broadcast on television, when the camera zooms in on the players, and the tracking uses the blocks on the players. Using voxels with known 3D coordinates makes it possible to select background voxels based on the 3D position for tracking. In the example above, the voxels can be selected on a floor plan of the court or on the plan of the bleachers with observers, for example. Similar to search, tracking can benefit through the 292 template manager, taking into account template differences between channels. Performance gains can also be realized by limiting 2D block or voxel searches within the constraints defined by the left and right channel relationships. Searching for blocks or voxels over similar regions makes it possible to use more tracking elements, giving better tracking accuracy and performance. The above analysis can be achieved through visual analysis of 3D channels, sensor measurements from the 3D camera system, or a combination of visual analysis and sensor measurements. The use of voxels can be part of a reconciliation of the visual analysis or sensor measurements associated with the 3D video channels.
Occlusion blocks 109, 259 can perform occlusion processing. Occlusion processing can, for example, be done using methods such as “chroma key”. For 3D video, occlusion processing could use the 3D information from the scene. Pixels within a scene can, for example, be related in both the left and right channels using methods such as pattern matching. The 3D position information for the corresponding left and right channel pixels can then be calculated using, for example, epipolar geometry techniques. Once the 3D position information for pixels is determined, an occlusion subsystem can determine whether or not these pixels should be blocked by foreground objects. As an example, if a foreground pixel block was determined to be located closer to the 3D camera than the background pixel block in the scene, the foreground pixel block could block the foreground pixel block background. FIG. 3A illustrates a first view of an exemplified 3D occlusion method according to an embodiment. The green colored bandage over a player wristband is shown with the bandage 306 on a left eye channel 302 and on a right eye channel 304. Using a chroma key method on each of the left eye channels /right 302, 304, this bandage 306 may be difficult to distinguish from the green color of the playing field in the background 310, and may increase the probability of bleeding. However, the modalities can use a parallax method to distinguish bandage 306 from background 310, even when similar colors, such as the color of bandage 306 and background 310, are involved. Parallax methods can also be used in conjunction with “chroma key” methods. FIG. 3B illustrates a second view of the exemplified 3D occlusion method of FIG. 3A. Using parallax, the 3D position of the green wrist bandage of player 306 can be determined to be closer to camera/viewer 312 than the similar green color of playing field 310. A virtual insert, such as a first line of descent yellow located behind the player on the playing field 310, can then be blocked by bandage 306 based on parallax determination. Insertions and enhancements using parallax can prevent unnatural “bleeding” of the first line of descent inserted over bandage 306. FIG. 4A illustrates a first view of a 3D occlusion method exemplified according to an embodiment, where spatial information is used by multiple players 406, 408. Using a search, for example, players 406, 408 are found in the left and right channels 402, 404. The positions of players 406, 408, including a distance from camera/viewer 412, can be determined using parallax. The playing field 410 can appear in the background and can be blocked by other players and by virtual inserts. As illustrated in FIG. 4B, insert 414 can be blocked by player 406, who is closest to camera/viewer 412, but cannot be blocked by player 408 who is furthest away from camera/viewer 412. Virtual insert 414 may appear to be between players 406 and 408 within the 3D scene, while there is no bleed in the background of the 410 playing field behind players 406, 408. This method can be extended to any plurality of players or objects within the 3D scenes.
Stereovisual analysis between the left and right views of 3D video can allow the generation of a map or depth mask, where pixels or regions of the video scene can be represented by depth measurements. Various methods for generating depth maps from stereoscopic view can be used. When the depth map precisely follows the drawing of objects within the scene, it can be used to generate an occlusion mask to remove sections of the inserted graphics. Removed sections are prevented from blocking objects in the foreground, allowing them to appear in front of the inserted graphic. This can, for example, be derived from the depth map by making a pixel-by-pixel comparison between an effective distance from the camera of an inserted graphic pixel and the distance from the camera to the point in the scene associated with the pixel. The inserted graphic can, for example, be removed when the pixel of the object or scene is closer to the camera than the virtual position of the graphic. The inserted graphic can, for example, be placed in the video when the object or scene pixel is further away from the camera than the graphic's virtual position.
Standard graphics that overlay video in 2D sports broadcasts can present more challenges in 3D video productions. The graphics can include fixed-score graphical layers, sometimes called score bars or “Fox boxes”, which can continuously display the current game time, score, and relevant game information. The graphics may also include temporary flash graphics, sometimes called lower third graphics, which provide background information about players in games. One approach to embedding such graphics in 3D video might be to make the graphics appear at fixed locations or distances from the camera. However, this may not be pleasant for viewers and in some cases can cause eye strain sometimes associated with 3D viewing. Whether the graphics look pleasing to viewers may depend on the depths of objects and the background in the 3D video scene at any given time or over an entire period of time. Positioning objects and background at greater viewing distances can make it easier for viewers to focus, and thus reduce the viewer's eye strain. Additionally, graphics located relatively close to the camera and/or far away from scene elements, for example, far in front of the screen/display plane and/or close to viewers, may distract them from the scene and/or may appear to viewers irrelevant to the scene. Integrating graphics within the scene, however, can alleviate such problems.
Modalities can use camera models to guide the location of virtual graphics, including fit graphics, in 3D video scenes so they can look more pleasing to viewers. Camera models, along with physical models, can allow the depth range of objects in scenes to be determined. In one modality, the fit graphics can be placed at the location of the plane/display screen, appearing to be located at the same distance from the viewer as the plane/display screen. In other modalities, fit graphs can be placed at comparable relative distances to the objects being previewed, or slightly in front of or behind the objects being previewed. Graph locations can differ based on the composition of a scene. For example, locating a graph on a tall camera with a wide surveillance shot of a football game might differ from locating a graph for a field level, zoomed in on a group of players on the field. In another modality, the adjustment graphics can be located at a depth beyond the objects or playing surface in the scenes, appearing at a relatively large distance from the camera. Camera models and search methods can be used to determine screen locations that are likely to be unlocked by players or referees, or algorithms can find unlocked areas directly. In additional modalities, graphics depth can be fixed for a given camera based on expected operational coverage, such as in surveillance view versus single player coverage. This can be confirmed in a systematic way using counting signals or listening to the game's audio call by the production director, for example. In another modality, camera models can be used to assess the suitability of the scene for graphic overlays taking into account the 3D distance from the camera. Graphs can be selectively enabled using various criteria in order to be pleasing to observers.
Modalities can be extended to other information that can be inserted into 3D video scenes. Closed captions or text can be inserted and integrated within 3D scenes. Inserts can be positioned to minimize eye strain or for other aesthetic or functional reasons. Dialog text, for example, can be located near the speakers in scenes. Metadata within video streams can, for example, make it possible to automatically find “closed caption” text within scenes. The placement of the virtual insert can be controlled by observers, and can be implemented, for example, as described in the US Patent Application Publication. No. 2010/0050082 to Katz et al, filed August 13, 2009 and entitled “Interactive Video Insertions, and Applications Thereof, the contents of which are incorporated in their entirety herein by reference. Viewers prone or sensitive to eye strain when watching 3D video may choose, for example, to prefer inserts at longer viewing distances.
Modalities can use virtual graphics embedded in scenes to present various types of data in 3D video so that the data looks pleasing to viewers. Game status information can be presented as an alphanumeric graphic integrated into the playing field, for example. In one sport, such data may be presented at a fixed field location, such as near the pitchers in a baseball game, or as part of the center circle or near the baseline in a football game. In another embodiment, an information graph may be attached to other virtual graphs, such as distance and/or descent graphs, which are associated with the location of the first descent line or scrimmage graph line. Information graphics can be displayed at alternative locations in the television production. This could include the back wall or bleachers of a baseball game, or a brand suspended from the top floor of a stadium structure in a football game production.
Locating virtual inserts at longer viewing distances can reduce eye strain and may reduce eye focus requirements after relatively close focus periods For some viewers who are sensitive to 3D video and who may develop headaches, focus on larger distances can reduce unpleasant symptoms. Having the ability to control virtual insertion distances enables video productions to reduce eye strain and other symptoms associated with 3D video.
Parameters for the 3D video system include eye spacing and convergence angle. The eye spacing is the distance between the lenses, and the convergence angle is the relative viewing angle between the lenses. Parameters can be manually controlled by an operator. This can be done by an individual designated to support one or more 3D camera operators. Motors can move cameras to adjust parameters. Parameters can be determined based on object distance and other scene information. Operators can determine parameters relying on experience with similar scenes, using familiar guidelines, using live image screens, or using other techniques. Cameras or camera controllers can calculate these parameters based on lookup tables or parameters such as viewing distance, viewing angle, scene geometry, etc. to determine camera settings. Eye spacing and convergence angle can be computed directly by visually matching and recording the feature points in the background between the two stereo channels. The known camera's eye spacing and convergence angle can be incorporated into the calculations for the 3D insertion models. These parameters can be embedded within the video as metadata or can be sent, for example, via a data channel directly to a virtual insertion system.
In a modality using camera data associated with 3D camera settings/parameters, a controller can embed the 3D camera data in the vertical blanking range of the video recording produced by the camera. 3D camera data can include eye spacing, convergence angle, zoom, focus, extender, and other 3D camera parameters or signals. Additional data from a 3D camera mount, such as rotation and tilt data similar to data from systems used for 2D video, can also be included in the 3D camera data. Such embedded 3D camera data can be routed with the video to remote locations, such as broadcast studios, for 3D virtual inserts. Video insertion regions, or insertion methods, can be selected to ensure data integrity at the final destination for the virtual insertion system. In another example, camera data can be encoded within unused audio channels, within the horizontal blanking region, or within the horizontal auxiliary data (HANC) region of the video.
Other types of metadata, except camera data, can be embedded in the video to enable virtual 3D inserts at downstream stages in a video production and/or distribution channel. In one example, 4 points can define a target area for virtual inserts for each of the left and right channels. These 8 points define a 3D rectangular plane that can be used for insertions at a later stage. Another number of points or alternate representation, such as edges or curves or parametric curves, can be used to designate the location of inserted enhancements. In another example, a 3D target location marker can be inserted into the video and then replaced downstream. The target location marker can represent image coordinates to insert an enhancement to a particular frame or keyframe. The target location marker for intermediate frames may be interpolated or otherwise determined from the target location marker at key frames within temporal proximity to the intermediate frame. Alternatively, the target location marker can represent the image coordinates of a physical object in the scene, such as a football field, used to determine the image coordinates of an enhancement or insert. Coordinates can be embedded or otherwise encoded in the 3D video in such a way that they do not affect the portion of the video used for the active view. This can include unused audio channels within the horizontal blanking region or within the horizontal auxiliary data (HANC) region of the video. The location marker can be processed so that regions of occlusion are not included in the location marker. In other examples, insertion locations can be encoded in metadata and occlusion key masks that are encoded separately. Several similar methods can be invented by those skilled in the art.
In some cases, it may be preferable to process the video upstream, for example, near the event location, as opposed to downstream, for example, in a studio. As an example, video available at an event can be uncompressed, while video in a studio can be compressed to facilitate satellite transmission. It may be that occlusion processing provides better results using uncompressed video, for example. In another example, it may be desirable to keep smaller equipment in a studio by transferring at least some processing to stages upstream of the studio.
For some applications, it may be desirable to use a combination of camera data and vision processing to calculate camera/insert models, or to use vision processing alone to calculate models and derive scene/camera system parameters. In one example, rotation and pitch data can be used to provide an approximate search location for a view-based system that could refine the search using view-based methods. In another example, an approximate search location could be derived by vision processing, eye spacing and convergence angle used from the 3D camera. Various combinations can be advantageous for different applications and signal availability. Vision-based processing can be achieved by visually analyzing the video channels of the 3D video. Channels can be processed independently or in combination.
Camera models for individual channels or the channels themselves can be used individually or together to calculate eye spacing or convergence angle parameters for 3D camera systems. Calculated parameters can be used to control 3D cameras or to increase manual camera operator control. These settings can be refined when video changes in response to new settings and new camera parameters/models are calculated. Boundaries can be placed on parameters such as zoom to avoid framing that might be unpleasant to viewers. The calculation of eye spacing and convergence angle can automatically enable faster setup of 3D cameras and provide more consistent results and settings. Automatic parameter calculation can save production costs by minimizing labor. 3D productions may favor relatively close-up views of scenes to give viewers a stronger sense of the 3D structure. Relatively long views, in which objects are at relatively large viewing distances, may look more like 2D video to viewers, and it can be considered that 3D productions are not necessary for such views. Framing action shots at shorter distances, such as during football television broadcasts, for example, can provide challenges to quickly set up or read 3D cameras. For such cases, modalities can be used to automatically adjust 3D camera parameters for varying scene shots. Using these methods, a long soccer pass can be covered with narrower variations of a long pass shot, as opposed to a more consistently wide shot, for example.
In another modality for virtual inserts, it may be desirable to locate the inserts on a particular plane, such as, for example, a 3D first descent line in a 3D football broadcast. The left and right channel inserts need to be positioned correctly within the frames so that the final insert will converge to the correct 3D position within the scene. Position errors can cause the first line of descent to appear either over or under the field plane, for example. To avoid such problems, the left and right channel insertion positions can be monitored and adjusted to ensure that the final 3D insertion converges on a particular plane in the scene. In some cases it may be acceptable or preferable to err on one side of a plane, such as above a playing field as opposed to below the field, for example.
FIG. 5 is a schematic illustration of a modality for generating the inserts and enhancements in 3D video, demonstrating that 3D video can be manipulated using an integrated approach. The input, 3D video input 501, may comprise separate left and right channels such as separate High Definition Serial Digital Interface (HD-SDI) feeds, or may comprise a composite feed having the left and right channels interlaced. Interlacing can include anamorphically compressing the left and right channels into a single HD-SDI stream, or using an alternative scheme to combine the feeds. The 501 3D video input can be modified to include the inserts and/or enhancements, and output as the 521 video output. The inserts and/or enhancements may appear to viewers to be realistically part of the original video. A main controller (integrated main control interface 503) can control and coordinate subsystem blocks 503-513. Other schemes of combining or encoding the individual channels into a composite stream are possible and may be based on video compression methods.
The built-in search block 505 can analyze 501 3D video input and calculate camera models and analyze scene geometry for program video scenes. Camera models and analysis can be derived from a single channel and extrapolated to the second channel, derived from a single channel and refined by second channel processing, computed from both channels with an optimally matched model , to both views, or any combination/permutation of the above. For the integrated search block 505, visual analysis can be used to derive image location from visual features in the left and right channels. A composite camera model can be generated for a particular frame by associating image locations of scene features in the 3D video channels and by corresponding 3D position of scene features. A composite camera model can be generated by reconciling the derived feature locations for the individual channels. This can be achieved, for example, by computing the least squares error fit for mapping between feature image locations and 3D scene locations.
The built-in tracking block 507 can update models based on a single view and extrapolate the second view. The built-in tracking block 507 can update models based on a single view refined by the second view, update models directly to optimally combine both views, or update models based on any combination/permutation of the above. Visual analytics can be used by the built-in tracking block 507 to track the location of features or points of interest between frames of an image sequence. This can be performed in conjunction with physical sensor measurements. Integrated tracking block 507 can generate a composite model in a similar way to controller 290's model manager 292 (FIG. 2), or it can track an object or part of an object such as a hockey player's helmet. In one modality, the object can be tracked and the scene can be tracked independently, such that the object's position relative to the scene can be derived for a graphical effect such as placing a rail behind the object, by example, a player. Furthermore, the built-in tracking block 507 can use the data extracted by the main camera instrumentation (rotation, tilt, eye spacing, convergence angle, etc.) and by the extracted lens information (zoom, focus, duplicator, focal length, convergence optical axis, etc.), communicating or producing electrical connections to the camera and/or lenses. In one modality, the pre-tap calibration process is used to derive the relationship between the left and right views. In another modality, the left and right channel models are derived based on knowledge of the physical camera, eg based on knowledge of left and right channel camera sensors, lens parameters, etc. At runtime, the camera model can be computed for one channel and the calibration model can be used to derive the camera model for another channel.
The built-in occlusion block 509 can determine when foreground objects should block inserts and enhancements in the 3D video by generating a mask key associated with each channel. For the “chroma key” methodology, one or both channels can be used to determine the ideal color of the insertion region, and this can be used to generate independent keys for both channels. Matching background pixels on both channels can be used to smooth out noise from shot or other occlusion objects in mask keys for a particular camera channel. Alternatively, occlusion masks for both channels can be directly computed from stereoscopic depth maps. Masks for both channels can be processed to ensure that the same corresponding pixels for each channel are selected for masking. Having some pixels masked in one channel and not masked in the other can result in objects such as insertion color errors or other objects generated by the improper combination of left and right channels. Visual analysis can be employed by the integrated occlusion block 509 to generate a mask key for each channel.
The built-in renderblock 511 can perform stereoscopic rendering of inserts based on a composite model determined from the left and right channel models. Graphic rendering engines can be used to generate simultaneous left and right channels for the virtual inserts embedded in 3D video. Key blending of occlusion masks with graphic keys can be implemented by the built-in 511 render block and possibly also the final blend of 3D video with 3D fill channels. Also, mixing can be implemented using an independent integrated mixer block 513, which can be comprised of two broadcast video mixers. In some embodiments, mixing can be implemented by a single broadcast mixer if the left and right channels are interlaced in a standard video format such as HD-SDI.
In one embodiment, the built-in render block 511 can render the visual elements according to camera models determined by the built-in search block 505 and the built-in tracking block 507. In one example, the visual element might be a three-dimensional object, and the built-in renderblock 511 can render the three-dimensional object to appear within the video. In this example, the integrated 511 renderer can render the dynamic/animated three-dimensional figures using three-dimensional modeling techniques, including, for example, texture loading, virtual camera modeling and rendering to a viewport.
Alternatively, the rendered 3D object can be static, such as a 3D representation of the first descent line. Three-dimensional rendering techniques can be used, such as those in game applications. In other examples, the visual element inserted into the 3D video input 501 may be an image, video, graphic, text, or advertisement (such as an advertising logo). Visual elements can be generated using character fonts, allowing inserts to be derived from data sources such as game data transmission channels or player position statistics during sporting events. Virtual elements that are blended or otherwise blended with 3D video can be considered to be inserting an enhancement into the 3D video.
Visual elements inserted into the 3D video input 501 can accompany the background scenes, such as an insertion of the locked virtual first line of descent to a background of the football playing surface. A visual element can accompany a point of interest, such as rings placed on players' feet during a sporting event. A portion of a graphic can follow a point of interest within the video, such as an arrow pointing to a player at a sporting event where only the arrowhead follows the location point of interest. 3D graphic insertion can be relative to both background scenes and foreground points of interest, for example, when graphics mark the trail (path) of a moving player in a video stream. In this case, the track points - foot position over time - are first initialized based on the tracking of the point of interest and then updated to compensate for camera movement.
In one modality, the graphics of the 3D drawing system for television commentators (“telestrator”), which diagrammatically depict the movements of players, for example, are superimposed in 3D on the game surface. In another embodiment, the graphics of the drawing system for television commentators can be represented as an adjustment at a set distance from the camera. This might work well for some applications, but it might be limited in others that could benefit from the spatial information of the scene. As an example, an operator of the television commentator drawing system might place circles around players based on distance from the camera or based on the plane of the screen. To some players, such circles may appear to envelop players, but to other players, circles may appear to float above them. Placing such circles based on the 3D position within the scene, such as near the players' feet, can provide improved perspective relationships between the players and the circles. Similar issues can apply to other graphics, including arrows pointing to players. For example, arrows placed at a set distance behind a screen plane may not look closely or obviously attached to the players. In a specific modality, the graphics of the 3D drawing system for television commentators in the 3D video can be positioned and/or generated in part based on captured user commands using a manual interface (touch screen, mouse, gaming device , tablet, etc.). The graphics of the 3D drawing system for television commentators can be made to accompany the 3D scenes, such as arrows that follow the players, for example. Scene tracking of the 3D drawing system for television commentators is possible using the methods described here including the use of camera eye spacing and convergence angle data, for example. In an illustrative example, circles could be inserted around players' waists. Using 2D methods, placing circles in scenes close to the players' feet could result in circles that are not associated with the players. Precise placement of the circles in 3D space could fix this. Graphics from the 3D drawing system for television commentators that are combined or otherwise blended with the 3D video can be considered to be an insertion of an enhancement to the 3D video.
FIG. 6 illustrates an exemplified 3D video production and distribution channel according to an embodiment. Virtual inserts/enhancements in the 3D video using the camera and other data information can be provided at different stages of the 3D video channel as illustrated in FIG. 6. The competition at a 602 sports venue can be covered by multiple 3D video cameras 604, and the 3D video feeds sent to a production site 606. The 3D virtual insertion system can modify a 3D video camera feed dedicated in the 3D insertion system 608 upstream of one of the on-site production left and right channel switches 606, for example. The system may modify the on-site 3D video program feed into the 3D input system 610 downstream of the on-site production 606.
Camera data information can be extracted by the 3D camera instrumentation or the 3D camera system, which can include the lens, controller, and tripod head. Camera data information can be provided to the 3D insertion system via a data connection or by encoding information in video format. Camera model information can be extracted directly by video analytics, or by a combination of video analytics and camera sensors. The 3D video feed can be transmitted by streaming video 612 to a remote location such as a production broadcast studio 614, where the virtual inserts can be integrated into the 3D video using a 3D insert system 616. data parameters can be transmitted from an on-site production stage to a remote location, where data is received and used to integrate an enhancement to 3D video. 3D video with virtual inserts can be distributed through a 3D video distribution 620, where it can be delivered to platforms including 622 television, 624 internet or 626 mobile devices.
In one embodiment, virtual inserts are integrated into 3D video at a remote location from an on-site production, using video analytics from one or both video stream channels. Location may include, but is not limited to, a broadcast studio, a regional cable broadcasting station, a local cable broadcasting station, a cable node, a TV Internet connection device, a computer system, and a device. mobile. In another embodiment, video analytics can take place on-site or at a remote location, such as, but not limited to, a studio or a regional cable broadcasting station. The information can be propagated downstream in the distribution chain to where the insertion is integrated (regional cable transmitting station, local cable transmitting station, cable node, Internet connection device via TV). In yet another embodiment, camera sensor information can be derived to the 3D video camera(s) and sent to a remote location from the location to be used by a virtual input system to integrate the graphics into the 3D video.
FIG. 7 is a schematic diagram of an exemplified computer system 700 used to implement the modalities for 3D video inserts. Various aspects of the various modalities can be implemented by software, unmodified software, hardware, or a combination of these. FIG. 7 illustrates an exemplified computer system 700 in which an embodiment, or parts thereof, may be implemented as computer-readable code. Various embodiments are described in terms of this exemplified computer system 700. After reading this description, it will be clear to a relevant art expert how to implement the modalities using other computer systems and/or computer architectures.
Computer system 700 includes one or more processors, such as processor 704. Processor 704 can be a general purpose or a specific purpose processor. Processor 704 is connected to a communication infrastructure 706 (e.g., a bus or a network).
Computer system 700 also includes main memory 708, preferably random access memory (RAM), and may also include secondary memory 710. Memory 710 may include, for example, a hard disk drive 712 and/or a removable storage unit 714. Removable storage unit 714 may comprise a floppy disk drive, a magnetic tape drive, an optical disk drive, a fast memory drive, or the like. Removable storage drive 714 reads from and/or writes to removable storage drive 718 in a well-known manner. Removable storage unit 718 may comprise a floppy disk, magnetic tape, optical disk, etc., which is read and/or written by removable storage unit 714. As will be clear to those skilled in the relevant art, the storage unit Removable 718 includes a tangible computer-readable storage medium that stores computer software and/or data.
In alternative implementations, secondary memory 710 may include other similar devices that allow computer programs or other instructions to be loaded into computer system 700. Such devices include, for example, a removable storage unit 722 and an interface 720. Examples such devices may include a program cartridge and cartridge interface (such as found in video game devices), a removable memory chip (such as an EPROM, or PROM) and associated socket, and other removable storage units 722 and interfaces 720 that allow software and data to be transferred from removable storage drive 722 to computer system 700.
Computer system 700 may also include a communication interface 724. Communication interface 724 allows software and data to be transferred between computer system 700 and external devices. Communication interface 724 can include a modem, a network interface (e.g., an Ethernet card), a communications port, a slot and a PCMCIA card, or the like. The software and data transferred via the communication interface 724 is provided to the communication interface 724 itself via the communication path 726. The communication path 726 can be implemented using wire or cable, optical fibers, telephone line, cellular telephone link , RF link or other communication channels.
In this document, the terms "computer program media" and "computer-used media" are used generally to refer to media such as removable storage drive 718, removable storage drive 722, and a hard drive installed in the storage drive. hard disk 712. The computer program medium and the computer used medium may refer to memories such as main memory 708 and secondary memory 710, which may be memory semiconductors (e.g., DRAMs, etc.) These computer program products are devices that provide software for the 700 computer system.
Computer programs (also called computer control logic) are stored in main memory 708 and/or secondary memory 710. Computer programs may also be received via communications interface 724. Such computer programs, when executed , enable computer system 700 to implement the modalities discussed here, such as the system described above. In particular, the computer programs, when executed, enable the processor 704 to implement the modalities processes. Accordingly, such computer programs represent the controllers of computer system 700. When embodiments are implemented using software, it may be stored in a computer program product and loaded into computer system 700 using removable storage unit 714, the 720 interface, the 712 hard disk drive, or the 724 communications interface.
The systems, apparatus and methods for 3D video insertions and their applications are described above. It will be clear that the Detailed Description section, not the Summary, is intended to be used to interpret the claims. The abstract may present one or more, but not all, of the exemplified embodiments of the present invention as noted by the inventors, and thus, is not intended to limit the present invention and the appended claims in any way.
The modalities were described above with the aid of functional building blocks that illustrate the implementation of specific functions and their relationships. The boundaries of these functional building blocks have been arbitrarily defined here for the convenience of description. Alternative limits can be defined as long as their specific functions and relationships are properly performed.
The foregoing description of the specific embodiments will fully reveal the general nature of the invention which others may, applying knowledge of the art, readily modify and/or adapt for the various applications of such specific embodiments, without undue experimentation, without abandoning the general concept of the present invention. Therefore, such adaptations and modifications are intended to be within the teaching and range of equivalents 10 of the described modalities, based on the teaching and guidance presented here. It is understood that the phraseology or terminology presented herein is used for the purpose of description and not limitation, so that the terminology or phraseology in the present specification will be interpreted by those skilled in the art in light of the teachings and guidelines. The scope and scope of the present invention should not be limited by any of the exemplified embodiments described above, but should be defined only in accordance with the claims and their equivalents.

权利要求:
Claims (13)
[0001]
1. Method characterized by comprising: - determining a first camera data parameter of a first camera model associated with a first channel of a 3D video, wherein the first camera model describes the field of view of the first channel; - determining a second camera data parameter of a second camera model associated with a second channel of the 3D video, wherein the second camera model describes the field of view of the second channel; - wherein the determination of the first camera data parameter and the second camera data parameter is based on a search analysis of at least the first channel, the search analysis based on voxels corresponding to at least the first channel; - generate a composite camera model by reconciling the first camera data parameter from the first model and the second camera data parameter from the second camera model; and - insert an enhancement to the 3D video based on the composite camera model.
[0002]
Method according to claim 1, characterized in that it further comprises reconciling the first camera parameter and the second camera parameter.
[0003]
3. Method according to claim 1, characterized by determining the first camera data parameter and the second camera data parameter is further based on the visual analysis of at least one of the first and second video channels 3D.
[0004]
4. Method according to claim 1, characterized in that the camera data parameters include the eye spacing and the convergence angle.
[0005]
The method of claim 1, further comprising automatically calibrating a 3D camera system associated with the 3D video, based on camera data parameters.
[0006]
Method according to claim 1, characterized in that the first 3D video channel and the second 3D video channel are obtained from a 3D camera system associated with the 3D video.
[0007]
Method according to claim 6, characterized in that it comprises restricting the search search analysis based on a relationship between the first and second 3D video channels.
[0008]
8. Method according to claim 1, characterized in that it further comprises updating the first and second parameters of camera data based on at least the tracking analysis of the first channel.
[0009]
9. Method according to claim 1, characterized in that it further comprises blocking the enhancement based on the composite camera model.
[0010]
10. Method according to claim 1, characterized in that it further comprises interactively positioning the enhancement in a 3D location according to the input received.
[0011]
11. Method according to claim 1, characterized in that the enhancement is a three-dimensional rendered visual element.
[0012]
12. Method according to claim 1, further comprising automatically positioning the enhancement at a 3D location according to a 3D video scene composition and the type of enhancement.
[0013]
13. Method characterized by comprising: - receiving a first camera data parameter associated with a first camera model, the first camera model is associated with a first channel of a 3D video and describes the field of view of the first channel; - receiving a second camera data parameter associated with a second camera model, the second camera model is associated with a second channel of the 3D video and describes the field of view of the second channel; - wherein the determination of the first camera data parameter and the second camera data parameter is based on a search analysis of at least the first channel, the search analysis based on voxels corresponding to at least the first channel; - generate a composite camera model based on at least the first camera data parameter and the second camera data parameter; and - insert an enhancement to the 3D video based on the composite camera model, where the insert is performed remotely from an on-site production piping stage.

类似技术:

公开号 | 公开日 | 专利标题

BR112012005477B1|2021-06-29|VIRTUAL INSERTS IN 3D VIDEO

KR20170127505A|2017-11-21|Methods and apparatus for performing environmental measurements and / or using these measurements in 3D image rendering

US9699438B2|2017-07-04|3D graphic insertion for live action stereoscopic video

US20130278727A1|2013-10-24|Method and systemfor creating three-dimensional viewable video from a single video stream

Matsuyama et al.2012|3D video and its applications

US10121284B2|2018-11-06|Virtual camera control using motion control systems for augmented three dimensional reality

US20120013711A1|2012-01-19|Method and system for creating three-dimensional viewable video from a single video stream

US9031356B2|2015-05-12|Applying perceptually correct 3D film noise

CN107431796A|2017-12-01|The omnibearing stereo formula of panoramic virtual reality content catches and rendered

KR20030019559A|2003-03-06|Method for multiple view synthesis

KR20070119018A|2007-12-18|Automatic scene modeling for the 3d camera and 3d video

US20110249090A1|2011-10-13|System and Method for Generating Three Dimensional Presentations

US9747870B2|2017-08-29|Method, apparatus, and computer-readable medium for superimposing a graphic on a first image generated from cut-out of a second image

WO2010011317A1|2010-01-28|View point representation for 3-d scenes

WO2011029209A2|2011-03-17|Method and apparatus for generating and processing depth-enhanced images

Fehn et al.2002|3D analysis and image-based rendering for immersive TV applications

Hilton et al.2011|3D-TV production from conventional cameras for sports broadcast

US9736462B2|2017-08-15|Three-dimensional video production system

JP6894962B2|2021-06-30|Image data capture method, device, and program for free-viewpoint video

Inamoto et al.2003|Immersive evaluation of virtualized soccer match at real stadium model

CN107862718A|2018-03-30|4D holographic video method for catching

CN103366392A|2013-10-23|Method and apparatus for adding auxiliary visual objects to an image or an image sequence

Angehrn et al.2014|MasterCam FVV: Robust registration of multiview sports video to a static high-resolution master camera for free viewpoint video

Grau et al.2010|Stereoscopic 3D sports content without stereo rigs

US11250886B2|2022-02-15|Point of view video processing and curation platform

同族专利:

公开号 | 公开日

US10652519B2|2020-05-12|

JP2013504938A|2013-02-07|

CN102726051A|2012-10-10|

US20110216167A1|2011-09-08|

BR112012005477A2|2016-04-19|

EP2476259A1|2012-07-18|

WO2011031968A1|2011-03-17|

JP5801812B2|2015-10-28|

MX2012002948A|2012-07-17|

CN102726051B|2016-02-03|

引用文献:

公开号 | 申请日 | 公开日 | 申请人 | 专利标题

US6741725B2|1999-05-26|2004-05-25|Princeton Video Image, Inc.|Motion tracking using image-texture templates|

RU99111072A|1996-11-27|2001-03-10|Принстон Видео Имидж|TRAFFIC TRACKING USING IMAGE TEXTURE STANDARDS|

JP4006105B2|1998-08-25|2007-11-14|キヤノン株式会社|Image processing apparatus and method|

JP4501239B2|2000-07-13|2010-07-14|ソニー株式会社|Camera calibration apparatus and method, and storage medium|

US7206434B2|2001-07-10|2007-04-17|Vistas Unlimited, Inc.|Method and system for measurement of the duration an area is included in an image stream|

JP4238586B2|2003-01-30|2009-03-18|ソニー株式会社|Calibration processing apparatus, calibration processing method, and computer program|

US7257237B1|2003-03-07|2007-08-14|Sandia Corporation|Real time markerless motion tracking using linked kinematic chains|

JP3945430B2|2003-03-19|2007-07-18|コニカミノルタホールディングス株式会社|Method for measuring object by image and imaging device|

DE102004062275A1|2004-12-23|2006-07-13|Aglaia Gmbh|Method and device for determining a calibration parameter of a stereo camera|

EP3007131A1|2005-09-22|2016-04-13|3M Innovative Properties Company of 3M Center|Artifact mitigation in three-dimensional imaging|

JP2008146497A|2006-12-12|2008-06-26|Canon Inc|Image processor and image processing method|

US8786596B2|2008-07-23|2014-07-22|Disney Enterprises, Inc.|View point representation for 3-D scenes|US20120105439A1|2008-12-18|2012-05-03|3D Fusion Inc.|System and Method For Adaptive Scalable Dynamic Conversion, Quality and Processing Optimization, Enhancement, Correction, Mastering, And Other Advantageous Processing of Three Dimensional Media Content|

US9269154B2|2009-01-13|2016-02-23|Futurewei Technologies, Inc.|Method and system for image processing to classify an object in an image|

US20120194442A1|2011-01-31|2012-08-02|Robin Sheeley|Touch screen video source control system|

FR2982448A1|2011-11-07|2013-05-10|Thomson Licensing|STEREOSCOPIC IMAGE PROCESSING METHOD COMPRISING AN INCRUSTABLE OBJECT AND CORRESPONDING DEVICE|

US20140176661A1|2012-12-21|2014-06-26|G. Anthony Reina|System and method for surgical telementoring and training with virtualized telestration and haptic holograms, including metadata tagging, encapsulation and saving multi-modal streaming medical imagery together with multi-dimensional [4-d] virtual mesh and multi-sensory annotation in standard file formats used for digital imaging and communications in medicine |

US9595124B2|2013-02-08|2017-03-14|Robert Bosch Gmbh|Adding user-selected mark-ups to a video stream|

US20140298379A1|2013-03-15|2014-10-02|Yume, Inc.|3D Mobile and Connected TV Ad Trafficking System|

KR20140120000A|2013-04-01|2014-10-13|한국전자통신연구원|Device and method for producing stereoscopic subtitles by analysing three-dimensional space|

GB2520311A|2013-11-15|2015-05-20|Sony Corp|A method, device and computer software|

GB2520312A|2013-11-15|2015-05-20|Sony Corp|A method, apparatus and system for image processing|

US20150235076A1|2014-02-20|2015-08-20|AiScreen Oy|Method for shooting video of playing field and filtering tracking information from the video of playing field|

CN104093013B|2014-06-25|2016-05-11|中国科学院遥感与数字地球研究所|In a kind of stereo vision three-dimensional visualization system, automatically regulate the method for image parallactic|

CN104869389B|2015-05-15|2016-10-05|北京邮电大学|Off-axis formula virtual video camera parameter determination method and system|

CN113648088A|2016-11-04|2021-11-16|阿莱恩技术有限公司|Method and apparatus for dental images|

US10701421B1|2017-07-19|2020-06-30|Vivint, Inc.|Embedding multiple videos into a video stream|

WO2019107150A1|2017-11-30|2019-06-06|株式会社ニコン|Detection device, processing device, installation object, detection method, and detection program|

US11032607B2|2018-12-07|2021-06-08|At&T Intellectual Property I, L.P.|Methods, devices, and systems for embedding visual advertisements in video content|

CN110691175B|2019-08-19|2021-08-24|深圳市励得数码科技有限公司|Video processing method and device for simulating motion tracking of camera in studio|

WO2021056030A1|2019-09-22|2021-03-25|Mean Cat Entertainment, Llc|Camera tracking system for live compositing|

US11227446B2|2019-09-27|2022-01-18|Apple Inc.|Systems, methods, and graphical user interfaces for modeling, measuring, and drawing using augmented reality|

法律状态:
2016-05-24| B08F| Application dismissed because of non-payment of annual fees [chapter 8.6 patent gazette]|Free format text: REFERENTE AS 3A, 4A E 5A ANUIDADES. |

2016-09-06| B08G| Application fees: restoration [chapter 8.7 patent gazette]|

2016-09-27| B08F| Application dismissed because of non-payment of annual fees [chapter 8.6 patent gazette]|Free format text: REFERENTE A 6A ANUIDADE. |

2016-11-01| B08G| Application fees: restoration [chapter 8.7 patent gazette]|

2019-01-08| B06F| Objections, documents and/or translations needed after an examination request according [chapter 6.6 patent gazette]|

2020-03-03| B06U| Preliminary requirement: requests with searches performed by other patent offices: procedure suspended [chapter 6.21 patent gazette]|

2021-05-11| B09A| Decision: intention to grant [chapter 9.1 patent gazette]|

2021-06-29| B16A| Patent or certificate of addition of invention granted [chapter 16.1 patent gazette]|Free format text: PRAZO DE VALIDADE: 20 (VINTE) ANOS CONTADOS A PARTIR DE 10/09/2010, OBSERVADAS AS CONDICOES LEGAIS. PATENTE CONCEDIDA CONFORME ADI 5.529/DF, , QUE DETERMINA A ALTERACAO DO PRAZO DE CONCESSAO. |

优先权:

申请号 | 申请日 | 专利标题

US24168709P| true| 2009-09-11|2009-09-11|

US61/241,687|2009-09-11|

PCT/US2010/048427|WO2011031968A1|2009-09-11|2010-09-10|Virtual insertions in 3d video|

[返回顶部]